This paper studies the distribution estimation of contaminated data by the MoM-GAN method, which combines generative adversarial net (GAN) and median-of-mean (MoM) estimation. We use a deep neural network (DNN) with a ReLU activation function to model the generator and discriminator of the GAN. Theoretically, we derive a non-asymptotic error bound for the DNN-based MoM-GAN estimator measured by integral probability metrics with the $b$-smoothness H\"{o}lder class. The error bound decreases essentially as $n^{-b/p}\vee n^{-1/2}$, where $n$ and $p$ are the sample size and the dimension of input data. We give an algorithm for the MoM-GAN method and implement it through two real applications. The numerical results show that the MoM-GAN outperforms other competitive methods when dealing with contaminated data.
translated by 谷歌翻译
To accomplish punctuation restoration, most existing methods focus on introducing extra information (e.g., part-of-speech) or addressing the class imbalance problem. Recently, large-scale transformer-based pre-trained language models (PLMS) have been utilized widely and obtained remarkable success. However, the PLMS are trained on the large dataset with marks, which may not fit well with the small dataset without marks, causing the convergence to be not ideal. In this study, we propose a Feature Fusion two-stream framework (FF2) to bridge the gap. Specifically, one stream leverages a pre-trained language model to capture the semantic feature, while another auxiliary module captures the feature at hand. We also modify the computation of multi-head attention to encourage communication among heads. Then, two features with different perspectives are aggregated to fuse information and enhance context awareness. Without additional data, the experimental results on the popular benchmark IWSLT demonstrate that FF2 achieves new SOTA performance, which verifies that our approach is effective.
translated by 谷歌翻译
The task of Compositional Zero-Shot Learning (CZSL) is to recognize images of novel state-object compositions that are absent during the training stage. Previous methods of learning compositional embedding have shown effectiveness in closed-world CZSL. However, in Open-World CZSL (OW-CZSL), their performance tends to degrade significantly due to the large cardinality of possible compositions. Some recent works separately predict simple primitives (i.e., states and objects) to reduce cardinality. However, they consider simple primitives as independent probability distributions, ignoring the heavy dependence between states, objects, and compositions. In this paper, we model the dependence of compositions via feasibility and contextuality. Feasibility-dependence refers to the unequal feasibility relations between simple primitives, e.g., \textit{hairy} is more feasible with \textit{cat} than with \textit{building} in the real world. Contextuality-dependence represents the contextual variance in images, e.g., \textit{cat} shows diverse appearances under the state of \textit{dry} and \textit{wet}. We design Semantic Attention (SA) and generative Knowledge Disentanglement (KD) to learn the dependence of feasibility and contextuality, respectively. SA captures semantics in compositions to alleviate impossible predictions, driven by the visual similarity between simple primitives. KD disentangles images into unbiased feature representations, easing contextual bias in predictions. Moreover, we complement the current compositional probability model with feasibility and contextuality in a compatible format. Finally, we conduct comprehensive experiments to analyze and validate the superior or competitive performance of our model, Semantic Attention and knowledge Disentanglement guided Simple Primitives (SAD-SP), on three widely-used benchmark OW-CZSL datasets.
translated by 谷歌翻译
深度学习取得了长足的进步,用于图像中的对象检测。对象检测的检测准确性和计算成本取决于图像的空间分辨率,这可能会受到相机和存储注意事项的约束。压缩通常是通过减少空间或幅度分辨率或有时两者都对性能的众所周知的影响来实现的。检测精度还取决于感兴趣的对象与摄像机的距离。我们的工作研究了空间和振幅分辨率以及对象距离对物体检测准确性和计算成本的影响。我们开发了Yolov5(ra-Yolo)的分辨率 - 自适应变体,该变体基于输入图像的空间分辨率,它在特征金字塔和检测头中变化。为了训练和评估这种新方法,我们通过结合TJU和Eurocity数据集的图像来创建具有不同空间和振幅分辨率的图像数据集,并通过应用空间调整和压缩来生成不同的分辨率。我们首先表明Ra-Yolo在各种空间分辨率上实现了检测准确性和推理时间之间的良好权衡。然后,我们使用拟议的RA-YOLO模型评估空间和振幅分辨率对物体检测准确性的影响。我们证明,导致最高检测精度的最佳空间分辨率取决于“耐受性”图像大小。我们进一步评估了对象到摄像机对检测准确性的影响,并表明较高的空间分辨率可实现更大的检测范围。这些结果为选择图像空间分辨率和压缩设置提供了重要的指南,这些分辨率和压缩设置基于可用的带宽,存储,所需的推理时间和/或所需的检测范围,在实际应用中。
translated by 谷歌翻译
失明和低视力(PBLV)的人在定位最终目的地或针对陌生环境中的特定物体时面临重大挑战。此外,除了最初定位和定位目标对象外,从目前的立场接近最终目标通常是令人沮丧和挑战,尤其是当人们摆脱最初的计划途径以避免障碍时。在本文中,我们开发了一种新颖的可穿戴导航解决方案,以为用户提供实时指导,以便在不熟悉的环境中有效地接近感兴趣的目标对象。我们的系统包含两个关键的视觉计算函数:在3D中以3D为中的初始目标对象定位以及对用户轨迹的连续估计,这既基于由用户胸部前面安装在用户胸前的低成本单眼相机捕获的2D视频。这些功能使系统能够提出初始导航路径,在用户移动时不断更新路径,并及时提供有关用户路径校正的建议。我们的实验表明,我们的系统能够以室外和室内的误差小于0.5米的误差操作。该系统完全基于视觉,并且不需要其他传感器进行导航,并且可以使用可穿戴系统中的Jetson处理器进行计算以促进实时导航辅助。
translated by 谷歌翻译
人脑中的神经网络如何代表常识性知识,而完整的相关推理任务是神经科学,认知科学,心理学和人工智能的重要研究主题。尽管使用固定长度向量代表符号的传统人工神经网络在某些特定任务中取得了良好的表现,但它仍然是一个黑匣子,缺乏可解释性,远非人类对世界的看法。受神经科学中的祖母细胞假设的启发,这项工作调查了可以将编码和峰值定时依赖性可塑性(STDP)机制的人群整合到峰值神经网络的学习中,以及神经元的人群如何通过指导符号来指导符号在不同的神经元种群之间完成顺序触发。不同社区的神经元种群共同构成了整个常识知识图,形成了巨大的图形尖峰神经网络。此外,我们引入了奖励调节的峰值时间依赖性可塑性(R-STDP)机制,以模拟生物增强学习过程并相应地完成相关推理任务,比图形卷积人工神经网络实现了可比的准确性和更快的收敛速度。对于神经科学和认知科学领域,本文的工作为进一步探索人脑代表常识知识的方式提供了计算建模的基础。对于人工智能领域,本文通过构建常识性知识表示并推理具有固体生物学合理性的尖峰神经网络,指出了实现更健壮和可解释的神经网络的探索方向。
translated by 谷歌翻译
虚拟现实(VR)视频(通常以360美元$^\ Circ $视频形式)由于VR技术的快速开发以及消费级360 $^\ Circ $摄像机和显示器的显着普及而引起了人们的关注。因此,了解人们如何看待用户生成的VR视频,这些视频可能会受到混乱的真实扭曲,通常是在时空和时间上局部的。在本文中,我们建立了最大的360美元$^\ Circ $视频数据库之一,其中包含502个用户生成的视频,内容丰富和失真多样性。我们捕获了139位用户的观看行为(即扫描路径),并在四个不同的观看条件下(两个起点$ \ times $ $ $ $ $两个探索时间)收集了他们的意见分数。我们对记录的数据提供了详尽的统计分析,从而产生了一些有趣的观察结果,例如观看条件对观看行为和感知质量的重大影响。此外,我们还探讨了我们的数据和分析的其他用法,包括评估360 $^\ CIRC $视频的质量评估和显着性检测的计算模型。我们已经在https://github.com/yao-yiru/vr-video-database上提供了数据集和代码。
translated by 谷歌翻译
神经隐式功能最近显示了来自多个视图的表面重建的有希望的结果。但是,当重建无限或复杂的场景时,当前的方法仍然遭受过度复杂性和稳健性不佳。在本文中,我们介绍了RegSDF,这表明适当的点云监督和几何正规化足以产生高质量和健壮的重建结果。具体而言,RegSDF将额外的定向点云作为输入,并优化了可区分渲染框架内的签名距离字段和表面灯场。我们还介绍了这两个关键的正规化。第一个是在给定嘈杂和不完整输入的整个距离字段中平稳扩散签名距离值的Hessian正则化。第二个是最小的表面正则化,可紧凑并推断缺失的几何形状。大量实验是在DTU,BlendenDMV以及储罐和寺庙数据集上进行的。与最近的神经表面重建方法相比,RegSDF即使对于具有复杂拓扑和非结构化摄像头轨迹的开放场景,RegSDF也能够重建表面。
translated by 谷歌翻译
In recent years, interest has arisen in using machine learning to improve the efficiency of automatic medical consultation and enhance patient experience. In this article, we propose two frameworks to support automatic medical consultation, namely doctor-patient dialogue understanding and task-oriented interaction. We create a new large medical dialogue dataset with multi-level finegrained annotations and establish five independent tasks, including named entity recognition, dialogue act classification, symptom label inference, medical report generation and diagnosis-oriented dialogue policy. We report a set of benchmark results for each task, which shows the usability of the dataset and sets a baseline for future studies. Both code and data is available from https://github.com/lemuria-wchen/imcs21.
translated by 谷歌翻译
基于A/B测试的政策评估引起了人们对数字营销的极大兴趣,但是在乘车平台(例如Uber和Didi)中的这种评估主要是由于其时间和/或空间依赖性实验的复杂结构而被很好地研究。 。本文的目的是在乘车平台中的政策评估中进行,目的是在平台的政策和换回设计下的感兴趣结果之间建立因果关系。我们提出了一个基于时间变化系数决策过程(VCDP)模型的新型潜在结果框架,以捕获时间依赖性实验中的动态治疗效果。我们通过将其分解为直接效应总和(DE)和间接效应(IE)来进一步表征平均治疗效应。我们为DE和IE制定了估计和推理程序。此外,我们提出了一个时空VCDP来处理时空依赖性实验。对于这两个VCDP模型,我们都建立了估计和推理程序的统计特性(例如弱收敛和渐近力)。我们进行广泛的模拟,以研究拟议估计和推理程序的有限样本性能。我们研究了VCDP模型如何帮助改善DIDI中各种派遣和处置政策的政策评估。
translated by 谷歌翻译